Monte Carlo Sampling Methods for Approximating Interactive POMDPs

نویسندگان

  • Prashant Doshi
  • Piotr J. Gmytrasiewicz
چکیده

Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential planning in uncertain single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agent’s belief about the physical world, about beliefs of other agents, and about their beliefs about others’ beliefs. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a general method for obtaining approximate solutions of I-POMDPs based on particle filtering (PF). We introduce the interactive PF, which descends the levels of the interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to mitigate the belief space complexity, but it does not address the policy space complexity. To mitigate the policy space complexity – sometimes also called the curse of history – we utilize a complementary method based on sampling likely observations while building the look ahead reachability tree. While this approach does not completely address the curse of history, it beats back the curse’s impact substantially. We provide experimental results and chart future work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decayed Markov Chain Monte Carlo for Interactive POMDPs

To act optimally in a partially observable, stochastic and multi-agent environment, an autonomous agent needs to maintain a belief of the world at any given time. An extension of partially observable Markov decision processes (POMDPs), called interactive POMDPs (I-POMDPs), provides a principled framework for planning and acting in such settings. I-POMDP augments the POMDP beliefs by including m...

متن کامل

Learning Others' Intentional Models in Multi-Agent Settings Using Interactive POMDPs

Interactive partially observable Markov decision processes (I-POMDPs) provide a principled framework for planning and acting in a partially observable, stochastic and multiagent environment, extending POMDPs to multi-agent settings by including models of other agents in the state space and forming a hierarchical belief structure. In order to predict other agents’ actions using I-POMDP, we propo...

متن کامل

Monte-Carlo Planning in Large POMDPs

This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The algorithm combines a Monte-Carlo update of the agent’s belief state with a Monte-Carlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, MonteCarlo sampling is used to break the curse of dimensionality both during belief state updates and during planning...

متن کامل

Monte Carlo POMDPs

We present a Monte Carlo algorithm for learning to act in partially observable Markov decision processes (POMDPs) with real-valued state and action spaces. Our approach uses importance sampling for representing beliefs, and Monte Carlo approximation for belief propagation. A reinforcement learning algorithm, value iteration, is employed to learn value functions over belief states. Finally, a sa...

متن کامل

Monte-Carlo Expectation Maximization for Decentralized POMDPs

We address two significant drawbacks of state-ofthe-art solvers of decentralized POMDPs (DECPOMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Artif. Intell. Res.

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2009